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METHQDS OF IDENTIFYING THE ACTIVITY OF GENE PRODUCTS 

This application claims priority to United States provisional patent 
application Serial No. 60/202,912 filed May 9, 2000 which is incorporated herein 
by reference in its entirety. 

FIELD OF THE INVENTION 

This invention relates to a general method for identifying the activity or 
function of gene products by identifying peptide binding partners which cause a 
cellular response in cells expressing the gene products. Accordingly, this 
invention is useful for determining the influence which specific genotypes have 
on phenotypes. In particular, the invention is concerned with a method of 
obtaining peptides which bind to a target such as a novel gene product. Such 
peptides provide 1) sequences that may be used to identify the natural protein 
partner of the target, and 2) enable synthesis of peptides which alter the 
phenotype of cells expressing the target. This invention also relates to providing 
material useful for conducting competitive binding assays capable of identifying 
small molecules reactive with and modulatory of the target protein. 

BACKGROUND OF THE INVENTION 

Present estimates of the number of different genes range over 30,000 and 
may read over 150,000 if one considers splicing varje'nts. Despite rapid progress 
at identifying genes based on analysis of the human genome, progress 
identifying the activity and function of the gene products lags significantly behind. 
A number of methods have been reported for connecting specific genes with 
specific diseases or conditions of pharmaceutical interest. One general method, 
referred to here as a genomic 'knock-out', eliminates, physically or by mutation, 
single base deletion or insertion, the gene in question. The earliest of these 
knock-outs in animals have been done in such a fashion that function can be lost 
in all cells derived from a single fertilized egg. Genomic knock-outs, done in 
cells as well as animals have revealed the functions of many genes. 
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o Genomic knock-outs have severe limitations regarding providing useful 

pharmacological targets. These limitations result from the occurrence of the 
knock-out in many places at one time and an all or none event occurring very 
early in development whereas many diseases result from timed and or graded 
alteration in gene activity. Furthermore, the gene causing the phenotypic change 
^ often is not the best target for drug therapy, and little information may be gained 
by this procedure on the best drug target. Lastly, knowing the gene target does 
not necessarily provide the investigator with a simple tool for obtaining small 
organic molecules which act on the target of interest and are useful for animal 

10 phenotyping and as drug leads. 

A second knock-out approach to elucidating gene function is the use of 
anti-sense nucleic acids to prevent translation of mRNA into functional proteins. 
In this approach, antisense molecules can be applied from without or within the 

j5 cell. Although this method has the clear advantage of being controllable with 
respect to timing, and graded response. It suffers from the fact that mRNA and 
protein are not uniformly linked, therefore allowing a large degree of variation in 
expected protein level manipulation. In addition, antisense approaches are 
prong to non-specific artifacts which will confound the phenotypic effect. 

20 

Sequence analysis (i.e., mutation identification) or quantitation of mRNA 
expression, are other genomic approaches to determining gene function and 
phenotype relation. Both methods however are associated with severe limitation 
for discovering the phenotypic relationship. As noted above, RNA quantitation is 
too removed from protein activity for one to rely on such information. In addition, 
although mutations can provide an association between a gene and a specific 
phenotype or condition, most mutations result in all or nothing events much 
unlike many disease conditions of interest. 

30 Recent information has made it clear that there are large networks of 

genes coding for products which appear to be interrelated. Knocking out one of 
such genes influences the level of expression of the other. These gene networks 
are being elucidated via DNA chip technology which allows for the simultaneous 

35 
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quantitation of mRNA from a very large number of genes. See United States 
Patent 5,800,992 and WO 95/35505 which are incorporated herein by reference. 
Although this information and the data bases derived from them are powerful 
road maps of networking, they suffer in their inability to distinguish initial 
interactions from secondary interactions from nth interactions. Although 

^ repeated quantitative analysis over many time points may provide some of this 
information of primary, secondary and later levels of protein interactions, such 
increases in experimental number are costly and time consuming. Network 
information on proteins of known function is useful, but is much less useful on 

IQ genes of unknown function. 

Many gene products produce their effect by binding to one or more other 
peptides or proteins. Presently, there are few approaches for identifying a 
protein's partner, i.e., the protein with which the target gene product directly 

^5 interacts. This information is critical as most direct protein:ligand interactions, 
whether non-covalent or covalent, have major consequences for protein activity, 
including signaling, information transfer within and between parallel signaling 
cascades, and molecular processing. Examples of such protein:ligand 
interactions may include those between an enzyme and its substrate, a ligand 
(peptide or not) and its receptor, or a transport protein and ligand. Regulatory 
proteins also often function through binding to a molecular partner. Partner 
information therefore includes knowing, for example, the ligand for its receptor, 
the substrate for a 'kinase' or a protease, the regulatory protein controlling 

25 mRNA translation or DNA transcription. 

The classical approach to partner identification is to obtain the target and 
its partner in some sort of isolated complex. Newer approaches place target and 
partner in two fusion proteins such that when they are complexed a signal is 

30 generated and the fusion protein or its gene sequence is used to identify the 
partner. For example, the yeast two-hybrid system has been used for partner 
identification. While the yeast two-hybrid approach is popular, it has a number of 
inherent problems including a high potential for false positives, the inability to use 
non-protein targets such as mRNA or membrane bound/extracellular proteins 
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and the inability to address postranslational modifications on a target. Moreover, 
systems based on fusion proteins in general, while powerful, are not easily 
applied to a very large number of genes of unknown function, as this becomes a 
random association problem and would necessitate a combinatorial approach 
covering all genes. 

For the subclass of proteins which interact with nucleic acids, most are 
currently identified based on information on the nucleic acid:protein complex 
which exists either in soluble form, or in gels or via some type of genetic 
recombinant expression system. Use of such an approach for the very large 
number of nucleic acid interacting proteins requires extensive efforts. 

Partner information is critical to developing a target binding assay capable 
of identify drug leads. Among the large number of assays that exist, there are in 
vitro and cellular ones, and many types of binding assay formats in each case. 
The vast majority of in vitro ones, contain a target and a ligand but a few require 
only target. Those without ligand, suffer from not being directed to any particular 
surface on the target and therefore will generate a high frequency of false 
positives, i.e. compounds which bind but do not cause a change in target activity. 
In the case of unkown gene targets, only the non-directed assay could be used 
which would mean a much larger effort at screening than desired. 

Panning of unknown gene products with phage displayed libraries to find 
natural partners would not seem worthwhile as the published results of panning 
of known receptor genes with known partners have not shown surrogate 
peptides to have natural partner amino acid motifs, sequences etc. Panning of 
the EPO (erythropoietin) and TPO (thrombopoietin) receptors identified potent 
peptides, which after dimerization are active. However, these identified peptides 
have no natural TPO or EPO motifs and therefore fail to identify these proteins in 
database searches based on amino acid sequences. 

Housey, United States Patent 5,877,007, relates to methods and 
compositions for screening for compounds which inhibit or activate a protein of 
interest expressed by a cell relative to a control cell. 



620500 v2 



-5- 



o Picksley et al., U.S. Patent 5,770,377, relates to methods of identifying 

compounds which interfere with the binding of an oncogene protein, such as 
MDM2, to p53. 

Blume, U.S. Patent 6,010,861 relates to methods and compositions for 
^ identifying drug candidates based on the ability of the drug candidate to compete 
with a reporter molecule Identified from a recombinant library. 

All of the publications discussed above are incorporated by reference 
herein in their entirety. 

SUMMARY OF INVENTION 

The method presented in detail in this application greatly simplifies 
identifying a target protein's partners and the establishment of site directed 
assays for test compounds which can regulate the activity of the target protein 

15 and enable phenotyping in in vitro , cellular and animal model systems and 
provide drug leads as well. A target protein's partner includes all naturally 
occurring binding partners and any precusor polypeptides which may be 
modified post translationally. A target is any naturally occurring target which may 

20 be a peptide, a protein, a nucleic acid, a polysaccharide or a combination 
thereof. A target may be, for example, a receptor, a transport protein, a 
regulatory site. 

In one embodiment, the method of the invention involves the isolation of 
2^ peptides, preferably from a recombinant phage display peptide library, which 

bind to the protein and nucleic acid (NA) products of genes of unknown function 
and contain sufficient information to allow identification of the natural partner 
protein of the target and high affinity binding peptides. This method can be 
automated to increase the number of known and unknown genes and gene 
products that can be used as targets. The term gene products encompass any 
post translational modifications. 
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o In one embodiment of the invention, a method of identifying the function of 

gene products is provided by detecting the phenotypic change in a cell or animal 
following contact of the gene product with a binding peptide. 

In another embodiment of the invention, the function of a binding peptide 
^ and its corresponding gene product is obtained through analysis of sequence 
data bases of naturally occurring protein or nucleic acid sequences. Homology 
of binding peptides identified from a library which bind a novel gene product, with 
a known peptide of known function, provides relevant information for determining 
the function of the novel gene product. 

10 

In another embodiment, the invention relates to a method for determining 
the activity of a gene product comprising the steps of 1 ) expressing the gene 
product in at least one cell type in which the gene product is active; 2) 
contacting the cells with a ligand known to bind the gene product; and 3) 
15 detecting a change in phenotype in the cells in which the gene product is active. 

Thus, this invention provides means for identifying peptide ligands 
capable of activating or inhibiting gene products through their ability to bind to 
such gene products as well as the activity and function of the gene products 
20 themselves. The identification of active peptide ligands also provides means for 
identifying other molecules, preferably small organic molecules, which also are 
active at the sites at which the peptide ligands bind and which therefore are 
useful as drug candidates. This invention provides methods for identifying the 
activity of both binding partners, i.e., ligand and receptor, of gene products which 
together result in a phenotypic change. 

Peptide binding ligands identified through this invention directly enable 
phenotyping studies in various systems, including cell, tissue, and simple 
organism, of surface and intracellular targets. Attachment of cell-penetrating 

30 

peptide sequences to the peptide binding ligands provides a means for detecting 
intracellular action of the peptide binding ligand. For example, the reagent 
BioPORTER® from Gene Therapy Systems may be used to deliver peptide. 

35 
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o The present invention also provides a metliod to simplify and quicken the 

establishment of high through-put screening system (HTS) formats of 
competition binding assays that can identify small organic molecules and other 
test substances which are reactive with the surfaces on unknown targets and 
one capable of modifying their activity. This can be used to facilitate 

^ phenotyping in more complex models such as organisms and animals and 
eventually provide leads for drug development. 

DETAILED DESCRIPTION OF THE INVENTION 

10 In one embodiment of the invention, the method involves panning of 

unknown gene protein products or other targets such as regulatory mRNA 
domains with phage displayed libraries of random peptides and obtaining a set of 
peptides which bind to such targets. Libraries included fully randomized libraries 
as well as libraries which contain fixed amino acids at particular positions among 
the other randomized amino acids. The number of peptide binders obtained may 
range from about 10 sequences to the order of 100s of sequences. More 
complex identification motifs may require obtaining a larger number of 
sequences. The peptide binders are sequenced and used individually or as 
consensus motifs to search for genes with expressed proteins of matching amino 
acid sequence. Soluble binding peptide ligands with and without penetrating 
peptide additions, are obtained from those which contain natural gene motifs or 
recurring novel sequences via synthetic or recombinant methods. To assess 

25 their activity and to identify their function either as gene products themselves, or 
of gene products to which they bind, they are applied to cells identified to 
express the target gene. Phenotypic changes including morphological, 
biochemical, genetic or immunological changes, other than changes in the target 
protein itself are then observed. The peptide binding ligands may then be 
labeled and used in competitive site directed assays for small molecules which 
interact at a regulatory domain of the target protein and as described in U.S. 
Patent 6,010,861 , incorporated herein by reference. 
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Another embodiment of the invention is a method of identifying a naturally 
occurring binding partner or precursor for a target by identifying an amino acid 
sequence motif which confers detectable binding properties of a peptide by 
screening a library of expressed amino acid sequences for binding of members 
to the target, identifying amino acid sequence motifs and comparing the 
identified amino acid sequence motifs to known amino acid sequences of a 
genome to identify a naturally occurring binding partner or precursor for said 
target. Motifs are patterns of amino acids common to the amino acids of the 
surrogates and the naturally occurring partner which may contain contact sites 
for the target. In addition, the nucleic acid sequence for identified naturally 
occurring binding partner or precursor may be determined 

A further embodiment of the invention is a method of identifying an amino 
acid sequence motif which confers binding properties to a natural target by 
screening a library of expressed amino acid sequences for binding to the target, 
determining the amino acid sequence of the members of the library which bind to 
the target, and identifying as motifs common amino acid sequences. 

Another embodiment of the invention is a method for determining the 
activity of a gene product by expressing said gene product in a cell, contacting 
the cells with a ligand which binds said gene product, and detecting a change in 
phenotype of the cells. In addition, the Invention embodies a method of 
determining the phenotypic outcome of the expression of a gene product by 
expressing the gene product in cells, contacting said cells with an amino acid 
sequence that has a binding motif identified by screening members of a peptide 
library which bind to the target, and detecting a change in phenotype of the cells. 

An additional embodiment of the invention is a method of identifying a 
naturally occurring binding partner or precursor for a target by identifying an 
amino acid sequences that bind the target by screening a library of expressed 
amino acid sequences and comparing the identified amino acid sequence to 
known amino acid sequences of a genome and identifying a gene product that 
possesses an amino acid sequence substantially similar to the identified amino 
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acid sequence. The nucleic acid of tlie identified naturally occurring binding 
partner nnay also be determined. 

The method of the invention has been tested using different types of 
targets wherein the partners and function are known. For example, one target 
was an extracellular protein growth and differentiation factor. Another target was 
a 5' - untranslated RNA domain. These tests are valid as neither target type has 
been panned before with peptide libraries to yield binding peptide-ligands or 
surrogates which have amino acid sequences sufficient to identify the target's 
natural and known partner. In the former case that partner is the factor's 
transmembrane receptor and in the later case a ribosomal binding protein, EIF2. 

A match of a surrogate amino acid sequence, or at least a part thereof, 
with a natural sequence enables partner identification. Surrogates containing 
natural sequences likely interact with regulatory surfaces. Accordingly, these 
surrogates should be useful as antagonists and some may also be agonists. In 
either case, agonism and antagonist are readily assayable in a phenotyping 
study. Antagonism is directly assayable in the presence of the natural partner or 
after addition of the natural partner to target containing systems. For those 
surrogates which do not contain natural sequence motifs, one does not know, a 
priori, whether these entities will be regulatory as the nature of their target's 
binding surface is unknown. However, analysis of surrogate libraries indicates a 
very high percentage of binders found by panning methodologies are to 
regulatory surfaces. The peptide binders are identified by competition with 
natural ligands, partners or neutralizing antibodies. In order to take into account 
the possibility of nonregulatory surrogates, phenotyping would be done with a 
small number, about six (6), surrogates with unrelated sequence motifs and 
those which modified test systems phenotypes would be used initially for site 
directed assay development. It is possible that some surrogates would only 
function as antagonists of agonistic surrogates, 
h Present databases anchsjomputers allow rapid searches for partners 

\\ based on surrogate sequences. Examples of available computer based 

V \ 
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programs to analyse sequences include BLAST, Patternfind, ExPASy, MEME 
(Multiple EM for Motif Elicitation), \ 

(http://meme.sdsc.edu/meme/website/intro.html, MAST (Motif Alignment and 
Search Tool, http://meme.sdsc.edu)fmem/website/mast-intro.html). 
(www.expasy.ch/) and ISREC (wwmisrec.isb-sib.ch/software/software.html). 
Identification of surrogates provides tools for partner identification, phenotyping 
and small molecule discovery. Given that a site directed assay is available at 
this early stage for the unknown target, high throughout screening allows the 
rapid identification of reactive small molecules of low target affinity. 
Combinatorial chemistry, allows for improvements in potency which would then 
provide small molecules for phenotyping and testing in animal models. 

The method of determining the activity and functions of an unknown gene 
product is determined according to a preferred embodiment of the invention as 
follows: 

1 . An unknown full length gene is expressed and the gene product 
protein is isolated. 

2. The gene product is then panned with a >_20mer surrogate library 
as described, for example in U.S. Patent 6,010,861, and members of the library 
which bind the gene product are isolated and sequenced. 

3. The sequences of a representative number of peptide binders are 
analyzed using a database such as BLASTp and then tBLASTn. These 
searches on protein and EST databases are directed at uncovering matches to 
known or unknown proteins and genes. 

4. Overlapping ESTs are knitted together to obtain full length 
partners. 

5. Upon positive partner identification, EST databases and general 
literature may be searched for information on gene expression ( i.e., mRNA, 
protein and activity levels) 

a. in various tissues, cells, organisms; 
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o b. in normal and pathologic states; 

c. at various developmental times; and 

d. other related or known proteins. 

Based on partner identification the function of the expressed target gene 
5 may be postulated. Confirmation of its activity and function is then confirmed by 
detecting its activity in cells in which it is expressed. 



10 COMPUTATIONAL APPROACH TO IDENTIFY NATURAL PARTNER 

After identification of a surrogate peptide binder, it is subjected to partner 
analysis using several different database search programs. In addition, the set 
of multiple surrogate peptide binders are aligned into groups based on motifs or 
j2 consensus regions. Motifs and consensus regions can be identified by 

sequence alignment programs like MEME (Multiple EM for Motif Elicitation), 
(http://meme.sdsc.edu/meme/website/intro.html). The motifs and consensus 
regions can be used as query patterns to search the available databases using 
MAST (Motif Alignment and Search Tool, 

http://meme.sdsc.edu/mem/website/mast-intro.html) or Patternfind. The 

identified sequences can be further examined for significant differences in the 

expected frequency of amino acids and the number of time a specific peptide 

sequence has been repeated. 
25 An example of a strategy for the computational approach to identifying a 

natural partner is shown below: 

In the initial step, the entire peptide sequence and consensus motifs (if 

found) are entered into an Advanced BLAST search 
30 (http://www.ncbi.nlm.nih.gov/blast/blast.cgi?Jform=1 ): using the following 

parameters: 

• Programs: blastp, tbiastn 
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• Databases: protein and nucleotide databases including dbest 
(ESTs), dsts (STSs) and htgs (unfinished high throughput 
genomic sequences) 

• Expect value: 1 000 or 1 0000 - 20000 

• Matrix: PAM30 or PAM70 

• Query: Consensus motif alone and varying combinations of 
sequence at N- and C-terminal ends 

In subsequent steps, motifs and consensus regions identified by 
sequence alignment programs like MEME are used as query patterns to search 
the available databases using Patternfind. 

For Patternfind, the following parameters are used: 

• Databases : Nonredundant, Swissprot, TREST and TRGEN 

• Limit: Between 1 0 and 5000 

• Query: Consensus motif alone and varying combinations of 
sequence at N-and C-terminal ends 

Data obtained from the various searches are analyzed under the following 
conditions: 

• Analyze results of different searches independently and then 
together to look for similar classes of proteins (eg. nucleic acid 
binding proteins, kinases) that may emerge. 

• Identify some of the best matches that show up in more than one 
kind of search (eg. same protein/ORF picked up by BLAST 
searches using different parameters, or by both BLAST and 
Patternfind) and compare sequence of protein in this region with 
other peptide surrogates containing this motif. 

• Examine potential significance of protein interaction in the context 
of the cellular function of target. 
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0 The output from each search are analyzed for partner hits based on the 

following criteria: 

1 . Search gives an exact match of at least 5-7 amino acids or 
appearance of the partner in at least 50% of the top cohort of any one search, 

^ and/or the appearance of the same or related hits occurring in multiple searches. 

2. Search matches an expected class of protein partners based on 
function, cellular location or tissue/disease distribution. 

3. Candidate produces a phenotype change when added into the 
jQ appropriate model system. 

Preferably, the partner hit has at least two of the criteria described above. 
More preferably, the partner hit appears in at least 50% of the top cohort of any 
one search or appears (or a related sequence appears) in multiple search 
results. Even more preferably, the partner hit has an exact match of at least 5 - 
7 amino acids. Criterion 2 addresses the biological relevance of a hit (e.g., 
distribution, disease indication, etc.), and criterion 3 relates to the biological 
activity of the surrogate and its ability to cause a phenotypic change in the 
appropriate test system. 

20 

The homology between the partner and surrogate can range from being 
scattered over a long stretch (for example 15-25 amino acids) to a perfect match 
within a short sequence (at least 5-8 amino acids). 

The generation of surrogates using large random and diverse libraries is 

25 

target independent and their utility for partner identification resides in the 
computational analysis of the identified peptide's sequence. For successful 
partner identification to be feasible, surrogates must exhibit either the natural 
linear or conformational surface properties complementary to the target under 
30 investigation. The complementary peptide surface is selected via a biological 
enrichment process (i.e., panning) which is based on preferential binding 
potency to the target protein. Since the preferred libraries for use with the 
invention contain totally random peptides ranging from 10 and up to about 50 
amino acids in length (and more preferably 20 to 40 amino acids in length), there 
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o are no known restrictions on the amino acids tliat can be selected to create tlie 
surrogate's 'complementary' surface. Thus, the examples described herein relate 
to the utility of the surrogate approach for finding the cognate receptor for both 
protein and non-protein targets. In the case of the surrogates for both HCV- 
mRNA and TNFp, It is clear that the large diversity and size of the original library 

^ was, in fact, critical to their successful isolation since libraries of <20 amino acids 
peptides would not have contained either the KcB7 peptide or the HCV-specific 
surrogates. 

In addition to the data presented in the Examples, we have screened 
other targets using this approach. While the expected natural partners were 
found for most of the proteins, there were some instances where surrogates 
were generated but lacked partner information (e.g., IGF-1R, growth hormone 
receptor, etc.). There are several possible explanations for these results. 

15 

Examples of targets panned and partners revealed by surrogate peptides. 



Target Panned 


Natural Site 


Partner Revealed 


Phenotype(s) of 




of Action 


by Surrogate 


Surrogate Peptides 


Coagulation FIX 


ExtraCell 


none 


antagonist 


TNF-(5 


ExtraCell 


TNFR1 


antagonist 


GHR 


Pl.Membr. 


none 


antagonist 


IgAR 


Pl.Membr. 


IgA 


Agonist and antagonist 


IGF-1R 


Pl.Membr. 


none 


Agonist and antagonist 


IR 


Pl.Membr. 


none 


Agonist and antagonist 


TNFR2 


Pl.Membr. 


TNF ligands 


Antagonist 


TNFR1 


Pl.Membr 


none 


antagonist 


TRAIL receptor 


P.Membr 


none 


ND • 


fasR 


P.Membr 


fasL 


antagonist 


PAB 1620 (anti- 


IntraCell 


p53 


NT 


p53 antibody) 








IVIDM-2 


IntraCell 


p53 


antagonist 


mRNA targets 


IntraCell 


RNA binding motif 


NT 


mRNA HCV 


IntraCell 


elF3 


NT 



This Table gives a list of the targets panned using the 20mer and 40mer 
random libraries. Column 2 lists the putative site of biological action for each 

35 
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o target. Column 3 describes whether a natural partner was found using a 

surrogate peptide found from the panning. Column 4 describes the biological 
activity of each surrogate in the appropriate biological assay. 

Legend: Extracell: Target expressed as an extracellular protein; Pl.Membr: 
^ Target expressed as a plasma membrane protein; IntraCell: Target expressed 
intracellularly; TNFp, Tumor Necrosis Factor P; IgAR, IgA receptor; GHR, 
Growth Hormone receptor; IGF-1R, Insulin-like Growth Factor-1 receptor; IR, 
Insulin receptor; TNFR2, Tumor Necrosis Factor Receptor-2 (p75); TNFR1, 
Tumor Necrosis Factor receptor-1 (p55); NT = Not Tested. 

10 

While the libraries used are large and diverse, it is probable that 
identification of a surrogate peptide with partner information is a rare event. With 
that in mind, it may require the isolation and sequencing of large numbers of 
clones (perhaps >500/target) in order to find the appropriate surrogate for 
15 partner identification. In addition, some targets may have complex or unusual 
protein:protein contact sites that preclude generation of a surrogate with partner 
information. 

Surrogates have also been found to have the minimal structural content 
20 necessary to induce a pharmacological effect on any target in addition to their 
use in partner identification. Most surrogates have been shown to have either 
agonist or antagonist activity in the appropriate biochemical and/or biological 
models (see Table above). Surrogates have also been shown to subdivide large 
25 contact surfaces into smaller contact domains through which target activity can 
be modified. These attributes provide for surrogate use in phenotyping and 
validating novel genes whose functions are unknown and for which there exist no 
known partners. Surrogates can also be used to develop competitive Site 
Directed Assays (SDAs) for each essential sub-domain, thereby allowing their 

30 

use in high throughput screening of large combinatorial libraries of small 
molecules. See U.S. Patent 6,010,861 . Most peptide surrogates isolated from 
these complex libraries by routine panning procedures bind to regulatory hot 
spots on varied targets. This non-random association between a surrogate and a 

35 

620500 v2 



-16- 



o target's "hotspot" (i.e., pharmacological active site) assures a high degree of 

probability that, once found, surrogates will have utility for the rapid development 
of SDAs capable of identifying small molecules of pharmacological importance. 

Selecting Expression Systems Of Original 
5 Target Gene For Phenotyping With Surrogate 

Two expression systems may be used to assess phenotypic changes 
resulting from binding of the gene product with the surrogate. In one method, 
cells which express the gene product are identified and used as a natural 
expression system. 

Information from EST data bases (cDNA libraries used to isolate ESTs; 
and others) is searched for the distribution of expressed cellular and tissue 
mRNA ( data collected by Northern blot analysis or other methods including but 
not limited to expression of protein or activity, if available) encoding the gene 
product. To identify high expression systems, surrogates may be labelled (via 
biotin, FITC tags), and used to probe tissue sections, tissue culture cells and 
organisms by immunological or fluorescent detection such as Elisas and FACS. 

Alternatively, if natural expression systems are unavailable, an expression 

20 

system may be created by expressing the gene in cells using standard 
techniques. Because the activity of the gene product may be cell type 
dependent, it is desirable to express the gene in a plurality of cell types. 

25 Expression And Purification Of Novel Protein Open Reading Frames 

No single heterologous expression system is adequate to produce all 
protein sequences in high yield and as fully folded active entities. In order to 
maximize the chances of recovering an active protein, any unknown new 
30 sequence should be expressed in multiple expression systems. One method for 
accomplishing this would be to sequentially clone the desired protein into several 
expression vectors optimized for individual cell culture expression systems. 
Alternatively, commercially available systems have been developed to allow 
protein sequences to be cloned and expressed in several ceil culture systems 
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simultaneously. One such system, the pTriEx™-1 Multisystem Vector, is 
available from Novagen. In this system, the protein sequence to be expressed Is 
cloned into a multisystem vector incorporating consecutive CAG, T7lac and p10 
promoters. These three promoters allow high level expression from the single 
vector in mamalian, E. coli and insect cells, respectively. The vector also 
incorporates HSVTag® and HisTag® tags on the c-terminus of expressed 
proteins to facilitate immunochemical detection and affinity purification. 
Expression levels can be checked using anti-HSV antibodies, and the crude 
proteins can be purified to near homogeneity using metal affinity 
chromatography. The purified protein would be suitable for use in biopanning 
and surrogate characterization. 

Detecting Phenotypic Changes 

To detect the activity of the gene, phenotype changes (morphorlogical, 
biochemical, immunological) are observed following contact of the surrogate to 
the cell, tissue or organism. The surrogate may be free or attached to a 
penetrating peptide sequence, as anti-target probe, in fashion similar to known 
methods used with anti-sense technology. 

Phenotyping can be done in natural systems if the target/target interaction 
is related to an observable phenotype. Under these conditions there is no need 
to over-express the target in a model cell. 

The overall strategy for determining the functions of a gene by detecting 
changes in phenotype may be summarized as follows: 

I. OBTAIN SURROGATE: 

a. Make gene product of unknown functions for panning 

i. Obtain oligoribonucleotides of 5' and 3'untranslated mRNA 
domains' 

ii. Obtain full length DNA and express open reading frame 
(ORF) protein and purify ORF protein product 
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b. Pan peptide libraries ( pliage, bacterial, yeast, mammalian cell or in 
vitro/ribosomal display) against gene product such as, for example: 

i. Untranslated 3' and 5'mRNA domains, or 

ii. ORF encoded protein 

c. Sequentially make nth generation mutated libraries based on 
panned surrogate's sequences until a limited number of consensus sequences is 
obtained. 

II. USE SURROGATE SEQUENCES TO 

a. Search data bases of translated consensus sequences and identify 
potential partner protein and genes. 

b. Synthesize ( or recombinantly express a fusion) surrogate 
consensus peptides obtained in the 1^4o n*^ generation pan of peptide displayed 
libraries either 

i. linked to cellular uptake peptide leader ( such as 
antanopedia) or 

ii. free ( i.e., with terminal amino acids for solubility if needed) 

III USE SOLUBLE SURROGATES TO DETECT CHANGES IN 
PHENOTYPE MEDIATED THROUGH ACTIVATION OF 
THE GENE PRODUCT BY THE SURROGATE 

a. Add surrogates to intact model cells and quantitate effect 

b. Add surrogates to in vitro model system and quantitate effect 

c. Add surrogates at various doses and produce graded phenotypic 
knockouts. 

The following non-limiting examples illustrate various aspects and 
embodiments of the Invention and should not be contrived as limiting the scope 
of the invention. All references cited herein are incorporated herein by reference 
in their entirety. 
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EXAMPLES 

Example 1 : Design of 4Q-mer and 20-mer Random Peptide Libraries 

DNA fragments coding for peptides containing 40 random amino acids 
were generated by a PGR approach using synthetic oligonucleotides. A 145 
base oligonucleotide was synthesized containing the sequence (NNK)4o where N 
= A, C, T, or G and K = G or T. See U. S. Patents 6,143,531 , 5,681 ,726 and 
388, which are hereby incorporated by reference. This oligonucleotide was used 
as the template in PGR reactions along with two shorter oligonucleotide primers, 
both of which are biotinylated at their 5' ends. The resulting 1 90 bp product was 
purified and concentrated (followed by digestion with Sfil and NotI). The resulting 
1 50 bp fragment was purified and the phagemid pGANTAB5E (Pharmacia) was 
digested with Sfil and Not!. The digested DNA was resolved using a 1% agarose 
gel, excised and purified by QIAEX II treatment (Qiagen). The vector and insert 
were ligated overnight at 1 5°G. The ligation product was purified. 
Electrocompetent cells were prepared by harvesting cells from a culture broth 
with an OD of 0.5-0.7 Uod- by centrifugation in a fixed rotor for 10 minutes at 
950g. The cells were washed three times with ice cold pure water. 
Electroporations were performed at 1500 V in an electroporation cuvette (0.1 mm 
gap; 0.5 ml volume) containing 12.5 ug DNA and 500 uL of E. coli strain TGI 
electrocompetent cells. Immediately after the pulse, 12.5 ml of pre-warmed 
(42°G) 2x YT medium containing 2% glucose (YT-G) was added and the 
transformants grown at 37°G for one hour. Gell transformants were pooled, the 
volume measured and an aliquot plated onto 2x YT-G containing 100 i^g/ml 
ampicillin (YT-AG) to determine the number of transformants. The diversity of the 
random 40-mer peptide cell library was found to be > 1 .6 X 10^°. The phage 
library was produced by rescue of the cell library according to standard phage 
preparation protocols. See e.g., Garcamo, et al. Proc. Natl Acad Sci USA 

(1998) 95: 1 1 146-1 1 1 51 . Phage titers were usually 4 X 1 0^^GFU/ml. 

\ 

Sequencing of randomly selected clones from the cell library indicated 
that about 54% of all clonekwere In-frame. The short FLAG sequence, DYKD, 
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A/0 was included at the N-terLnus as an immunoaffinity tag. In addition, the E-tag 
fj A^s^pitope (GAPVPYPDPLEPR) was engineered into the carboxy terminus of the 
[yT peptide. \ 

A second random phage library of 20-mer peptides was constructed using 
the same approach. The diversity of this cell library was found to be > 1 .1 X 10^^ 
clones and sequencing revealed 77% of the clones were in frame. 



Example 2: Panning TNF-B 

A standard method was used to coat and block all microtiter plates. The 
target was diluted to 1 mg/ml in 50 mM sodium carbonate buffer, pH 9.5. One 
hundred microliters of this solution was added to an appropriate number of wells 
in a 96-well microtiter plate (MaxiSorp plates, Nunc) and incubated overnight at 
4° C. Wells were then blocked with MPBS (PBS containing 2% non fat milk) at 
room temperature for one hour. 

Eight wells being used for each round of panning. The phage for the 
phage library were incubated with MPBS for 30 minutes at room temperature, 
then 100 ^1 was added to each well. For the first round, the input phage titer 
was 4x10^^ cfu/ml. For rounds 2 and 3, the input phage titer was approximately 
1 0^^ cfu/ml. Phage were allowed to bind for two to three hours at room 
temperature. The wells were then quickly washed 13 times with 300 )xl/well of 
MPBS. Bound phage were eluted by incubation with 100 |a,l/well of 20 mM 
glycine-HCI, pH 2.2 for 30 seconds. The resulting solution was then neutralized 
with Tris-HCI, pH 8.0. Log phase TGI cells were infected with the eluted phage 
by incubation at 37 °C for 1 hr. Helper phage (M13K07) was then added 
(multiplicity of infection(MOI)=15) and cells incubated in the presence of 50 i^g/ml 
ampicillin and 2% glucose for 1 hr at 37 °C with shaking at 250 rpm. Following 
infection, cells were pelleted, resuspended in the initial culture volume of 2xYT 
containing 50 |ag/ml ampicillin and 50 ng/ml kanamycin and grown overnight at 
37 °C with shaking at 225 rpm. Cells from the overnight culture were pelleted 
and supernatant containing phage was recovered. Phage was precipitated with 
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6% PEG 8000, 300mM NaCI and chilled on ice for 1 hr. Precipitated phage was 
pelleted by centrifugation at 10,000 x g for 30 min, then resuspended in PBS + 
ImM MgCl2 (1/100 of the initial volunne). The phage was used for the next round 
of panning. 

For Elisa analysis of individual clones, colonies were picked and phage 
prepared as described above using helper phage, M13K07. Microtiter wells 
were coated and blocked as described above. Wells were coated with either 
IGF-1 R or a control IgG MAb. Phage were added at 100 |il/well and incubated at 
room temperature for 2 hr. The phage solution was then removed, and the wells 

10 were washed three times with PBS at room temperature. Anti-Mi 3 antibody 

conjugated to horseradish peroxidase (Pharmacia Biotech) was diluted 1 :3000 in 
MPBS and added to each well (100 ^il/well). Incubation was for another hour at 
room temperature, followed by PBS washes as described. Color was developed 

15 by addition of ABTS solution (100 ^il/well; Boehringer). Plates were analyzed at 
405 nm using a SpectraMax 340 plate reader (Molecular Devices) and SoftMax 
Pro software. Data points were averaged after subtraction of appropriate blanks. 
A clone was considered "positive" if the A405 of the well was > 2-fold over 
background. 

An additional series of panning experiments were performed using the 
eluted phage from the first panning of TNF-p. This additional panning, a 
subtractive panning, was included to remove any peptides that cross-reacted 
with other members of the TNF family. In particular, the phage was 

25 

subsequently panned against TNFR1 , TNFR2 and TNF-a. 

r \\n """^^ panning experiments identified a surrogate peptide, KcB7, with the 

amino acid sequence RKEMGGGGGPGWSENLFQ. A Blastp search, using 
Y^^^everal different queries revealed TNFR1 which is the natural biological partner 
^ / ofTNFp. \ 

BLASTp search results for the TNFp Surrogate peptide KcB7 



. \\ Query: WSENLFQ 
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Score E 



Sequences producing significant alignments: (bits) Value 



20 


'2419 


20 


2419 


20 


2419 


20 


2419 


20 


2419 



prfl|2102238A tumor necrosis factor alpha inhibitor [Homo s... 
gb|AAA36756.1| (M60275) VTNF receptor [Homo sapiens 
pdb|lTNR|R Chain R, Tumor Necrosis Factor Receptor P55 ... 
pdb|lNCFlA Chain A, Bindiik Protein, Cytokine Mol_id: l;Mo... 
reflNP_00 1056.1 1 tumor necrdsis factor receptor 1 (55kD) >g... 

>prfl|2102238A tumor necrosis\factor alpha inhibitor [Homo sapiens] 
Length =160 \ 

Score = 20.4 bits (41), Expect 4 2419 

Identities = 7/7 (100%), Positives = 7/7 (100%) 



Query: 1 WSENLFQ 7 

WSENLFQ 
Sbjct:96 WSENLFQ 102 

>gb|AAA36756.1| (M60275) TNF receptor [Homo sapiens] 
Length = 453 \ 

Score = 20.4 bits (41), Expect = 2419 
Identities = 7/7 (100%), Positives = 7/7 (100%) 

Query: 1 WSENLFQ 7 

WSENLFQ 
Sbjct: 136 WSENLFQ 142 

>pdb|lTNR|R Chain R, Tumor Necrosis Factor Receptor P55 (Extracellular 
Domain) Complexed With Tumor Necrosis Factor-Beta 
Length =139 

Score = 20.4 bits (41), Expect = 2419 
Identities = 7/7 (100%), Positives = 7/7 (aOO%) 



Patternfind search results for the^NF-B Surrog ate peptide KcB7 

Query sequence: WSENLFQ ^ 

IV. DATABASE: NONREDUNd^NT 

Limit 10 
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gp|M60275|339760|AC886035F969E231 TNF receptor [Homo sapiens] 
Occurences: 1 \ 
Position : 136 WSENLFQ \ 

sp|P19438|TNRl_HUMAN|4CEFBA96D03B8225 (TNFRSFl A..)TUMOR NECROSIS 
FACTOR RECEPTOR 1 PRECURSOR (TUMOR NECROSIS FACTOR BINDING 
PROTEIN 1) (TBPI) (P60) (TNF-M) (TNF-RI) (P55) (CD120A).[Homo sapiens] 
Occurences: 1 
Position : 136 WSENLFQ 

2 matches found 

Closer examination of the complementary sequences revealed that the 
short N-terminal sequence RKEMG ana the C-terminal sequence WSENLFQ 
were identical to regions on TNFR1 (aniino acids 77-81 and 107-11 3 
respectively). These segments correspoilded to amino acids within two critical 
ligand:receptor contact domains. In the c^e of the N-terminal grouping, the 
surrogate contained 5 of the 15 amino acids\pf the 77-81 contact domain 
whereas in the C-terminal grouping, the surrogate contained 6 of the 9 amino 
acids identified within the 107-1 13 contact donr 



Comparison with human TNFR1 extracellular domain 

lYPSGVIGLVPHLGDREKRDSVCPQGRYIHPQNNSICCTKCHKGTYLYNDCPGPG 
QDTDCRECesgsFTASENHLRhcLscSkckkeMgQVEISSCTVDRDTVCGCRKNQYR 
HYWSENLEflcFNCSLCLNGTVHLSCQEKQNTVCTCHAGFFLRENECVSCS 



Contact residues are based on Banner et al., (1993) Cell 73: 431-445. 

Bold= contacted by TNFp subunit A 
lower case = contacted by TNpp subunit C 
italics = contacted by TNPp both subunits A and\C 
Underline = homology to the ^ clone 



TNFP 

LPGVGLTPSAAQTARQHPKMHLAHSTLKPAAlk,/GDP51CeNSLLW/L4A^7i)iL4F 
LQDGFSLSNNSLLVPTSGIYFVYSQVVF^GZAYSPKAPSspLyLAHEVQLFSsqypfH 
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vPLLSSqKmVYPGLeEP^LHSMYHGAAFQLTQGDQLSThTdGIP^LKLSPSTVFF 
GAFAL 

Bold = TNFp subunit A 
lower case = TNFp subunit C 



Comparison with human TNFR2 e xtracellular domain 

rKEMGGGGGGpgwSENlFQ \ 

LPAQVAFTPYAPEPGSTCRLREYYDQTAQMCSCSKCSPGQHAKVFCTKTSDTVCD 
SCEDSTYTQLWNWVPECLSCGSRCSSDQVETOACTrEQNRICTCRpgwYCAlSKQ 
EGCRLCAPLRKCRPGFGVARPGTETSDVVCKPeAPGTFSNTTSSTDICRPHQICN 
VVAIPGNASRDAVCTSTSPT \ 

Exam ple 3: P anning RNA target 

Surrogate peptides were obtained by panning a portion of tine 5'UTR of 
HCV mRNA using both the 20mer and 40mer random libraries. All solutions and 
surfaces were pretreated with DEPC or RNaseZap (Ambion, Inc.), respectively, 
to eliminate RNase contamination that may compromise the integrity of the RNA. 
Biotinylated - RNA target was diluted to 1 mg/ml in binding buffer (PBS 
containing 1 mM MgCb), denatured at 65 °C for 5 min and reannealed by slow 
cooling to room temperature to allow for appropriate refolding. The synthetic 
biotinylated-RNA target had the following sequence, 5'-biotin'AA UUG CCA GGA 
CGA CCG GGU CCU UUC UUG GAU CAA CCC GCU CAA UGC CUG GAG 
AUU-3'. Reannealed RNAs were stored in small aliquots (10-25|il/tube) at -20 
°C. Microtiter wells were treated with RNaseZap (Ambion, Inc.) before use. One 
hundred microliters of RNA solution diluted to 2.5 ng/|al was added to an 
appropriate number of wells in a 96-well microtiter plate precoated with 
Streptavidin (Pierce) and incubated for 1 hr at room temperature. Unbound 
streptavidin was then blocked with 50 |al of 2 mM biotin at room temperature for 1 
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o hr. Four wells were used for each round of panning and 100 |j.l phage was 

added to each well. Pahge were precipitated with RNase-free 6% PEG 8000 + 
0.3 M NaCI, washed with the same solution once and resuspended in RNase- 
free PBS + 1 mM MgCl2 + Superasin (RNase inhibitor from Ambion, Inc.). For 
the first round, the input phage titer was 1 x 10^^ cfu/ml. For rounds 2 and 3, the 

^ input phage titer was approximately 10^^ cfu/ml. Phage were allowed to bind for 
two to three hours at room temperature. The wells were then quickly washed 13 
times with 400 |al/well of PBS. Bound phage were eluted by incubation with 150 
l^l/well of 50 mM glycine-HCI, pH 2.2 + 0.1% BSA for 5 min. The resulting 

10 solution was then neutralized with Tris-HCI, pH 8.0. Log phase TGI cells were 
infected with the eluted phage by incubation at 37 °C for 1 hr. Helper phage 
(M1 3K07) was then added (multiplicity of infection(MOI)=1 5) and cells incubated 
in the presence of 50 ^g/ml ampicillin and 2% glucose for 1 hr at 37 °C with 
shaking at 250 rpm. Following infection, cells were pelleted, resuspended in the 
initial culture volume of 2xYT containing 50 ^g/ml ampicillin and 50 ng/ml 
Kanamycin and grown overnight at 37°Cwith shaking at 225 rpm. Cells from the 
overnight culture were pelleted and supernatant containing phage was 
recovered. Phage was precipitated with 6% PEG 8000, 300mM NaCI and chilled 
on ice for 1 hr. Precipitated phage was pelleted by centrifugation at 10,000 x g 
for 30 min, washed once with the same solution and then resuspended in PBS + 
1mM MgCl2(1/100ofthe initial volume). 

For Elisa analysis of individual clones, colonies were picked and phage 
prepared as described above using helper phage, M13K07. Streptavidin-coated 
microtiter plates were blocked with PBS containing 2% non fat milk for 1 hr at 
room temperature, treated with RNaseZap, then coated with biotinylated RNA 
target (lOOng/well) by incubation for 1 hr at room temperature. Superasin 

30 (RNase inhibitor from Ambion, Inc.) was added to the wells prior to addition of 
100 lal/well of phage from isolated clones and incubated at room temperature for 
2 hr. The phage solution was then removed, and the wells were washed three 
times with PBS at room temperature. Anti-M13 antibody conjugated to 
horseradish peroxidase (Pharmacia Biotech) was diluted 1:3000 in PBS (also 
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containing Superasin) and added to each well (100 |al/well). Incubation was for 
another hour at roonn temperature, followed by PBS washes as described. Color 
was developed by addition of ABTS solution (100 |al/well; Boehringer). Plates 
were analyzed at 405 nm using a SpectraMax 340 plate reader (Molecular 
Devices) and SoftMax Pro software. Data points were averaged after subtraction 
of appropriate blanks. A clone was considered "positive" if the A405 of the well 
was > 2-fold over background. 

Peptides HCV-3-F5, HCV-3-H8 and HCV-NG-D9 were obtained from the 
40-mer library. Peptide HCV-3-C3 was obtained from the 20mer library. 
Sequence analysis of these surrogate peptide binders to HCV using MEME 
(Motif Elicitation Program) and other peptide sequence alignment programs 
identified a consensus sequence TxRLL. Database searches using BLAST and 
Patternfind identified a human gene product, subunit pi 70 of elF3. The 
consensus sequences are shown below in bold and underlined. Sequences 
outside the motif that are conserved between the surrogates and elF3 are in 
Italics and underlined. 

HCV alignImo^ts 

eIF3: " gBLDNIQTPE-SVLLSAVSGEDTQDRTDRLLLTPWVKFLWESY 
.CONSENSUS: \^ TxRLL 

' HCV-NG-D9 \^ TSGESSGDETRRVLTSSSARTLPN 

HCV - 3 - F5 LLVTG<2F^- SQLLLGGAVCGP - - STPRLRTGLCRLSGT 

HCV- 3 -H8 RRTCGDPAAMi>ERLSCRAGDYRGASHTGRLLNLRGMHQYP 
HCV-3-C3 FTTPRHLSGRTl^^MiyiRDSTS 



OUTPUT FROM ADVANCED BLAST SEARCH FOR HCV mRNA 
SURROGATE QUERY - SEARCH 1: 

Query sequence: TSGESSGDRTRRVLT 

Program: blastp 

Database: swissprot 

Expect value: 10000 



OUTPUT: 

Sequences producing significant alignments: Score E Value 
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splP31258 HXAB_CHICK HOMEOBOX PROTEIN HOX-All (GHOX-II) (CH. . 

sp|P23116 IF3A_M0USE EUKARYOTIC TRANSLATION INITIATION FACT.. 

sp|P39690 KHS1_YEAST KILLER TOXIN KHS PRECURSOR (KILLER OF ., 

sp I 083264 NUSG_TREPA TRANSCRIPTION ANTITERMINATION PROTEIN ., 

sp| P13 079 |CARB_STRTH RRNA METHYLTRANSFERASE 
(CARBOMYCIN-RES. 



sp|Q14152 

sp|P39925 
sp| P16561 
sp|P52023 
spj P20978 
spj P15989 



IF3A_HUMAN EUKARYOTIC TRANSLATION INITIATION FACT. 

AFG3_yEAST MITOCHONDRIAL RESPIRATORY CHAIN COMPLE . 

HEMA_VACCT HEMAGGLUTININ PRECURSOR 

DP3B_SYNP7 DNA POLYMERASE III, BETA CHAIN 

HEMA_VACCC HEMAGGLUTININ PRECURSOR 

CA36 CHICK COLLAGEN ALPHA 3 (VI) CHAIN PRECURSOR 



(bits) 




20 


477 


19 


1072 


19 


1072 


19 


1072 


19 


1072 


19 


1072 


19 


1072 


19 


1404 


19 


1404 


19 


1404 


19 


1404 



List truncated here... 
>sp| P31258 lHXAB_CHICK HOMEOBOX PROTEIN HOX-All (GHOX-II) 
(CHOX-1.9) 
jQ Length = 2 97 

Score = 20.4 bits (41), Expect = 477 
Identities = 8/11 (72%) , Positives = 9/11 (81%) 

Query: 2 SGESSGDRTRR 12 
SG SSG RTR+ 
15 Sbjct: 217 SGSSSGQRTRK 227 

>sp| P23116 1 IF3A_M0USE EUKARYOTIC TRANSLATION INITIATION 
FACTOR 3 SUBUNIT 10 (EIF-3 THETA) 

(EIF3 P167) (EIF3 P180) (EIF3 P185) (P162 

PROTEIN) 

(CENTROSOMIN) 
20 Length = 1344 

Score = 19.2 bits (38), Expect = 1072 
Identities = 8/13 (61%), Positives = 10/13 (76%) 

Query: 2 SGESSGDRTRRVL 14 
25 SGE + DRT R+L 

Sbjct: 133 SGEDTQDRTDRLL 145 

>sp|P3 9690|KHSl_YEAST KILLER TOXIN KHS PRECURSOR (KILLER OF 
HEAT SENSITIVE) 

Length = 708 

30 Score = 19.2 bits (38), Expect = 1072 

Identities = 8/13 (61%) , Positives = 10/13 (76%) 

Query: 3 GESSGDRTRRVLT 15 

G+SSG T+R LT 
Sbjct: 98 GKSSGSATKRGLT 110 
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>sp|083264 |NUSG_TREPA TRANSCRIPTION ANTITERMINATION PROTEIN 
NUSG 

Length = 185 

Score = 19.2 bits (38), Expect = 1072 
Identities = 7/12 (58%) , Positives = 9/12 (74%) 

Query: 2 SGESSGDRTRRV 13 

+GE GDRT R+ 
Sbjct: 117 AGEIKGDRTPRI 128 

>sp|P13 079|CARB_STRTH RRNA METHYLTRANSFERASE (CARBOMYCIN- 
RES I STANCE PROTEIN) 

Length = 2 99 

Score = 19.2 bits (38), Expect = 1072 
Identities = 8/12 (66%) , Positives = 8/12 (66%) 

Query: 2 SGESSGDRTRRV 13 

SG S DR RRV 
Sbjct: 4 0 SGRSEADRRRRV 51 

>sp|Q14152 I IF3A_HUMAN EUKARYOTIC TRANSLATION INITIATION 
FACTOR 3 SUBUNIT 10 (EIF-3 THETA) 

(EIF3 P167) (EIF3 P180) (EIF3 P185) (KIAA0139) 
Length = 1382 

Score = 19.2 bits (38), Expect = 1072 
Identities = 8/13 (61%) , Positives = 10/13 (76%) 

Query: 2 SGESSGDRTRRVL 14 

SGE + DRT R+L 
Sbjct: 133 SGEDTQDRTDRLL 145 

>sp I P3 992 5 I AFG3_YEAST MITOCHONDRIAL RESPIRATORY CHAIN 
COMPLEXES ASSEMBLY PROTEIN AFG3 

(TAT-BINDING HOMOLOG 10) 
Length = 761 

Score = 19.2 bits (38), Expect = 1072 
Identities = 8/14 (57%) , Positives = 10/14 (71%) 

Query: 2 SGESSGDRTRRVLT 15 

S +SGD + RVLT 
Sbjct: 136 SSNNSGDDSNRVLT 149 



35 
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OUTPUT FROM ADVANCED BLAST SEARCH FOR HCV mRNA 
SURROGATE QUERY - SEARCH 2: 

Query sequence: TSGESSGDRTRRVLTSSS 

Program: blastp 

Database: swissprot 

Expect value: -e 10000 

Sequences producing significant alignments: 

splQ01728|NAC1_RAT SODIUM/CALCIUiVI EXCHANGER 1 PRECURSOR (NA... 
sp|P70414|NAC1_MOUSE SODIUM/CALCIUM EXCHANGER 1 PRECURSOR (... 
sp|P48765|NAC1_BOVIN SODIUM/CALCIUM EXCHANGER 1 PRECURSOR (... 
splP48766|NAC1_CAVPO SODIUIVl/CALCIUiVI EXCHANGER 1 PRECURSOR (... 
sp|P32418lNAC1_HUMAN SODIUM/CALCIUM EXCHANGER 1 PRECURSOR (... 
splP23685|NAC1_CANFA SODIUIVI/CALCIUM EXCHANGER 1 PRECURSOR (... 
sp|P48767|NAC1_FELCA SODIUM/CALCIUM EXCHANGER 1 PRECURSOR (... 
sp|P08173|ACM4_HUMAN MUSCARINIC ACETYLCHOLINE RECEPTOR M4 
sp|P23116|IF3A_MOUSE EUKARYOTIC TRANSLATION INITIATION FACT... 
sp|Q14152|IF3A_HUMAN EUKARYOTIC TRANSLATION INITIATION FACT... 
sp|P15656|FGF5_MOUSE FIBROBLAST GROWTH FACTOR-5 PRECURSOR (... 
splP30042|ES1_HUMAN ESI PROTEIN HOMOLOG PRECURSOR (PROTEIN ... 
sp|035491|CLK2_MOUSE PROTEIN KINASE CLK2 
sp|P49760|CLK2_HUMAN PROTEIN KINASE CLK2 

splP15172|MYOD_HUMAN MYOBLAST DETERMINATION PROTEIN 1 (MYOG... 
splO75069lY481_HUMAN HYPOTHETICAL PROTEIN KIAA0481 (HH1480) 
splP02533lK1CN_HUMAN KERATIN, TYPE I CYTOSKELETAL 14 (CYTOK... 
sp|P30989|NTR1_HUMAN NEUROTENSIN RECEPTOR TYPE 1 (NT-R-1) (... 
splP30551|CCKR_RAT CHOLECYSTOKININ TYPE A RECEPTOR (CCK-A R... 
sp|Q08369|GAT4_MOUSE TRANSCRIPTION FACTOR GATA-4 (GATA BIND... 

List truncated here... 

>sp|Q0172 8 |NAC1_RAT SODIUM/ CALCIUM EXCHANGER 1 PRECURSOR 
(NA+/CA2+ -EXCHANGE PROTEIN 1) 
Length =971 

Score = 21.2 bits (43), Expect = 65 
Identities = 9/15 (60%) , Positives = 11/15 (73%) 

Query: 3 GESSGDRTRRVLTSS 17 

GE G RT ++LTSS 
Sbjct: 933 GELGGPRTAKLLTSS 947 

>sp| P70414 |NAC1_M0USE SODIUM/CALCIUM EXCHANGER 1 PRECURSOR 
(NA+/CA2+- EXCHANGE PROTEIN 1) 
Length = 970 

Score = 21.2 bits (43), Expect = 65 
Identities = 9/15 (60%) , Positives = 11/15 (73%) 

Query: 3 GESSGDRTRRVLTSS 17 



Score E 


Valu 


(bits) 




21 


65 


21 


65 


20 


190 


20 


190 


20 


190 


20 


190 


20 


190 


19 


249 


19 


249 


19 


249 


19 


327 


19 


327 


18 


428 


18 


428 


18 


428 


18 


428 


18 


561 


18 


561 


18 


561 


18 


561 
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GE G RT ++LTSS 
Sbjct: 932 GELGGPRTAKLLTSS 946 

>.sp| P4 8765 |NAC1_B0VIN SODIUM/CALCIUM EXCHANGER 1 PRECURSOR 
(NA+/CA2+- EXCHANGE PROTEIN 1) 
Length =970 

Score = 19.6 bits (39), Expect = 190 
Identities = 8/14 (57%) , Positives = 10/14 (71%) 

Query: 3 GESSGDRTRRVLTS 16 

GE G RT ++LTS 
Sbjct: 932 GELGGPRTAKLLTS 945 

>sp| P48766 |NAC1_CAVP0 SODIUM/CALCIUM EXCHANGER 1 PRECURSOR 
(NA+/CA2+- EXCHANGE PROTEIN 1) 
Length = 970 

Score =19.6 bits (39), Expect = 190 
Identities = 8/14 (57%) , Positives = 10/14 (71%) 

Query: 3 GESSGDRTRRVLTS 16 

GE G RT ++LTS 
Sbjct: 932 GELGGPRTAKLLTS 945 

>sp| P32418 |NAC1_HUMAN SODIUM/CALCIUM EXCHANGER 1 PRECURSOR 
(NA+/CA2+- EXCHANGE PROTEIN 1) 
Length = 970 

Score =19.6 bits (39), Expect = 190 
Identities = 8/14 (57%) , Positives = 10/14 (71%) 

Query: 3 GESSGDRTRRVLTS 16 

GE G RT ++LTS 
Sbjct: 932 GELGGPRTAKLLTS 945 

>sp| P23 685 |NAC1_CANFA SODIUM/CALCIUM EXCHANGER 1 PRECURSOR 
(NA+/CA2+- EXCHANGE PROTEIN 1) 
Length = 97 0 

Score =19.6 bits (39), Expect = 190 
Identities = 8/14 (57%) , Positives = 10/14 (71%) 

Query: 3 GESSGDRTRRVLTS 16 

GE G RT ++LTS 
Sbjct: 932 GELGGPRTAKLLTS 945 
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>sp|P48767|NACl_FELCA SODIUM/CALCIUM EXCHANGER 1 PRECURSOR 
(NA+/CA2+ -EXCHANGE PROTEIN 1) 
Length = 970 

Score = 19.6 bits (39), Expect = 190 
Identities = 8/14 (57%) , Positives = 10/14 (71%) 

c Query: 3 GESSGDRTRRVLTS 16 
GE G RT ++LTS 
Sbjct: 932 GELGGPRTAKLLTS 945 

>sp| P08173 |ACM4_HUMAN MUSCARINIC ACETYLCHOLINE RECEPTOR M4 
Length - 47 9 

10 Score = 19.2 bits (38), Expect = 249 

Identities = 8/14 (57%), Positives = 13/14 (92%) 

Query: 5 SSGDRTRRVLTSSS 18 

SSG+++ R++TSSS 
Sbjct: 10 SSGNQSVRLVTSSS 23 

15 >sp| P23116 I IF3A_M0USE EUKARYOTIC TRANSLATION INITIATION 
FACTOR 3 SUBUNIT 10 (EIF-3 THETA) 

(EIF3 P167) (EIF3 P180) (EIF3 P185) (P162 

PROTEIN) 

(CENTROSOMIN) 
Length = 1344 

Score = 19.2 bits (38), Expect = 249 
Identities = 8/13 (61%) , Positives = 10/13 (76%) 

Query: 2 SGESSGDRTRRVL 14 

SGE + DRT R+L 
Sbjct: 133 SGEDTQDRTDRLL 145 

25 

>sp|Q14152 I IF3A_HUMAN EUKARYOTIC TRANSLATION INITIATION 
FACTOR 3 SUBUNIT 10 (EIF-3 THETA) 

(EIF3 P167) (EIF3 P180) (EIF3 P185) (KIAA0139) 
Length = 13 82 

Score = 19.2 bits (38), Expect = 249 
30 Identities = 8/13 (61%), Positives = 10/13 (76%) 

Query: 2 SGESSGDRTRRVL 14 

SGE + DRT R+L 
Sbjct: 133 SGEDTQDRTDRLL 145 
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>sp|P15656|FGF5_MOUSE FIBROBLAST GROWTH FACTOR-5 PRECURSOR 
(FGF-5) (HBGF-5) 

Length =264 

Score =18.8 bits (37), Expect = 327 
Identities = 9/16 (56%) , Positives = 10/16 (62%) 



5 Query: 3 GESSGDRTRRVLTSSS IB 
G+SSG R R T SS 
Sbjct: 3 9 GDSSGSRGRSSATFSS 54 

>sp| P30042 |ES1_HUMAN ESI PROTEIN HOMOLOG PRECURSOR (PROTEIN 
KNP-I) (GT335) 

Length = 2 68 

10 

Score =18.8 bits (37), Expect = 327 
Identities = 8/18 (44%) , Positives = 11/18 (60%) 



30 



Query. 1 TSGESSGDRTRRVLTSSS 18 

T G+ S +R VLT S+ 
Sbjct: 93 TKGQPSEGESRNVLTESA 110 

>sp|03 54 91 |CLK2_M0USE PROTEIN KINASE CLK2 
Length = 4 99 

Score = 18.4 bits (36), Expect = 428 
Identities = 8/11 (72%) , Positives = 8/11 (72%) 

Query: 2 SGESSGDRTRR 12 

S SS DRTRR 
Sbjct: 34 SWSSSSDRTRR 44 

>sp|P4 9760|CLK2_HUMAN PROTEIN KINASE CLK2 
Length = 499 

Score = 18.4 bits (36), Expect = 428 
Identities = 8/11 (72%) , Positives = 8/11 (72%) 

Query: 2 SGESSGDRTRR 12 

S SS DRTRR 
Sbjct: 34 SWSSSSDRTRR 44 

Database searches using Patternfind at the ISREC server were performed 
using parameters appropriate for short protein queries and were successful in 
identifying a human gene product, subunit p170 of elF3. Searches using the 
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consensus region as the query likewise identified sequence homology to the 
large subunit p170 of elF3. 

Output from Patternfind for HCV mRNA surrogate query 

Query sequence: DRTxRLL 



sp|Q14152|IF3A_HUMAN|485C01B28D67EBBA (EIF3S10)EUKARYOTIC 

TRANSLATION INITIATION FACTOR 3 SUBUNIT 10 (EIF-3 THETA) (EIF3 PI 67) 

(EIF3 P180) (EIF3 P185) (KIAA0139).[Homo sapiens] 

Occurrences: 1 

Position : 139DRTDRLL 

sp|P4637 3|FASl_RHOFA|A66B6F3DF1286566 (FAS1..)CYT0CHR0ME P450 FASl (EC 1.14.-.- 

).[Rhodococcus fascians] 

Occurrences: 1 

Position : 170 DRTARLL 

sp|P23 1 1 6|IF3 A_MOUSE|F4CAE2 1 69F5777 1 2 (EIF3 S 1 0. .)EUKARYOTIC 
TRANSLATION INITIATION FACTOR 3 SUBUNIT 10 (EIF-3 THETA) (EIF3 PI 67) 
(EIF3 PI 80) (EIF3 PI 85) (PI 62 PROTEIN) (CENTROSOMIN).[Mus musculus] 
Occurrences: 1 
Position : 139 DRTDRLL 



Example 4: Panning mRNA 

Short linear amino acid domains found in naturally occurring RNA-binding 
proteins were identified in peptides isolated from the random peptide libraries. 
These domains are generic, i.e. general RNA binding protein motifs rather than 
specific RNA binding motifs. Surrogate peptides were obtained by panning a 
portion of the 5'UTR of four different mRNA targets using both the 20mer and 
40mer random libraries as described in Example 3. Isolated phage binders from 
rounds three and four of each pan were sequenced. For each mRNA target, the 
predicted amino acid sequences of the peptide binders were analyzed for both 
overall amino acid content and the occurrence of known RNA-binding motifs and 
consensus domains. All of the peptide binders showed enrichment of arginine 
residues, as would be expected for RNA binding proteins. Also, tryptophan, 



5 



Database: 
Limit: 



Nonredundant 
10 
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serine, and glycine residues were enriched. The following table gives a 
comparison of the specific amino acid composition of peptide binders with regard 
to their average frequency of occurrence seen within the original unpanned 
library. These data were compared to the actual frequency of occurrence in the 
library before and after panning on the various mRNA targets denoted as M1 , 
M2, and M3. All numbers are expressed as a percentage of the expected 
frequency. 





Arg 


Gly 


Trp 


Ser 


Exp. Freq. 


9.4 


6.3 


3.1 


9.4 


Library 


9.4 


11.6 


3.1 


7.7 


Ml 


13.9 


12.6 


5.2 


10.0 


M2 


13.0 


13.0 


4.3 


8.3 


M3 


12.4 


12.3 


5.1 


9.8 



In addition, several peptides from each pan showed the presence of the 
RGG box, a well-defined RNA-binding motif, as indicated below. RGG 
sequences in each surrogate is in bold and underlined. 



20 



Ml 


-3 


-B7 


RGLFTEWFRGGSWSNYRVTS 


Ml 


-3 


-E8 


TDGGRSVISDNVRGGSRLWLWIRHGSWSQAWGPQDAWSSK 


Ml 


-3 


-H6 


RVSSAQPGCTSRVRFRCPRGGLLFNGVTSTNPKTGLSNAQ 


Ml 


-4 


-HI 


WYVGVLSYWPHLSGGGRLQVRCLIGRGGFGCRGG 


M2 


-3 


-CI 


WPPGRTLSDLIRGGAGARGM 


M2 


-3 


-C9 


S SGGLHRWS ALRGGHGHGLA 


M2 


-3 


-E2 


AMRLKPIAFKGPRAGAGWVEVQPCFAAFRAACTRGGSHHH 


M2 


-3 


-E3 


LHAGWDVTAPRRACKGAQGPGLHGRFYCHRGGLCSGLGRC 


M2 


-3 


-E9 


DEQS S LKGKLRGALVRLGMGHAMPHRGGVWP STGRP S KQG 


M2 


-3 


-H12 


WTPRHGPMRCWRHQS VFPVGAGPHWALWP I KGPRGGRTAC 


M2 


-NG-C7 


RKTGSNIWLPLYHKVCPASTRAGNGRGGSRFLWGSMQTNC 


M3 


-3 


-B9 


RLQRRGGGAVAWWVGFGVGLLWGRLLL 1 1 LGWVLMWFLS 


M3 


-3 


-C2 


QHSEHGGTEWRKRGGMAFAASFLCMRDSYRTTRLRSLLG 


M3 


-3 


-C7 


GTRHVINRVRDSSGVPCKRFGGLQFSQMGKCTIPRGGA 


M4 


-NG-A4 


VLRGGSVGKGSLMWCQEVDWRTGGPRSNLWGLWNGRQPPK 



Furthermore, one sequence was found from panning the 20-mer random 
peptide library on traget M1 that contained the KH motif, which is also a known 
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RNA-binding motifs. The surrogate motif corresponding to tine KH domain is in 
bold and underlined. 

KH Motif VIGxxGxxF 

Ml - 3 -C6 GVIGGRGLLFPLSGFLHQHR 

Example 5: Panning of Tie-1(pro-angiogenic tyrosine kinase) 

Surrogates acting as for Tie-1 were identified by panning against Tie-1 . 
Six wells of a 96-well microtiter plates were coated with Tie-1 extracellular 
domain (R&D Systems) at concentrations ranging from 50-500 ng/well). Plates 
are incubated overnight at 4°C. At the same time, an aliquot of E.coli, strain TG1 
was inoculated into 2x YT media and grown overnight at 37°C. The next day, 
unbound antigen was removed and the coated wells were blocked with 300 ul of 
2% non-fat milk in PBS (NFM-PBS) for one hour at room temperature. The 
plates were then washed plates 3 times with PBS. The phage libraries were 
thawed and mixed with 0.1 vol of PBS-2% non-fat milk (NFM), 100 |xl of each 
library was added to the antigen-coated wells and the plates are incubated for 3 
hours at room temperature. Each well was washed 13 times with PBS-2% NFM 
and the phage eluted with lOOul of 50 mM glycine-HCL containing 0.1% BSA 
(pH2.2) following a five minute incubation. The eluted phage from each library 
was pooled, neutralized with 100 ul of 1M Tris-HCI (pH 8.0), and added to 10 ml 
of log phase E coli TGI (ODeoo = 1 -0), and amplified in 2x YT- glucose medium 
for one hour at 37°C. Helper phage (Ml 3K07) and ampicillin were then added 
and the cells were incubated for an additional hour at 37°C. The cells were 
pelleted at 3500 RPM for 20 minutes, resuspended in 2x YT-AK medium (YT 
medium containing ampicillin and kanamycin) and incubated overnight at 37°C. 
The next day, the infected bacterial cells were centrifuged at 3500 RPM at 4°C 
for 15 minutes and the pellet discarded. The supernatant contained the phage 
and was precipitated with % volume of 30% PEG-8000 In 1.6 M NaCI by 
incubating on ice for 1 hour. The precipitant was centrifuged at 10,000 RPM at 
4°C for 30 minutes and the phage pellet resupended in about 1 ml of NFM-PBS. 
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The phage was then used for the next round of panning. Three-four rounds of 
panning were done for both the 20-nner and 40-mer libraries. Two to three 
hundred random clones were picked from rounds 3 and 4 and grown in 96 well 
cluster plates as a master stock. 

For screening, 40 ul of master stock was transferred from each master to 
another set of cluster tubes containing 400 [i\ of 2x YT-AG and helper phage 
(final concentration of 5X10^°/ml). The tubes were incubated at 37°C with 
constant shaking for two hours. The cultures were centrifuged at 2500 x g at 4°C 
for 20 minutes, the supernatant was discarded, and the bacterial pellet was 
resuspended in 400ul of 2x YT-AK medium and was incubated overnight at 
37°C. At that time, the cells were removed by centrifugation at 2500 x g and the 
supernatants were transferred to a new set of cluster tubes and used in ELISA or 
stored at 4°C. 

Each well of a MaxiSorp plate (Nunc) was coated with 1 00 ^1 of target (1 
iag/ml) overnight at 4°C. The wells were blocked with NFM-PBS for 1 .5 hours at 
room temperature. Phage was added at 100 ul/well and the plates incubated for 
3 hours at room temperature. After washing 3x with PBS-Tween, plates were 

20 probed with an anti-M1 3 antibody conjugated to horseradish peroxidase (1 :3000 
in PBS-NFM) for 1 hour at room temperature followed by addition of 100 ul of 
ABTS for 15-30 minutes at room temperature. The OD was measured using a 
SpectraMax Microplate Spectrophotometer (Molecular Devices) at 405 nM after 

25 a 30 minute incubation at room temperature. 

A total of 104 binders were sequenced yielding 32 unique sequences. 
Several different peptide motifs were identified that selectively bind to the Tie-1 
receptor but not to other tyrosine kinases (insulin receptor, IGFR-1 R). The 
criteria for a positive clone is a >2 fold difference vs. an unrelated target. The 

30 

results of the following database searches identified mannose-binding protein 
associated serine protease 2 (MASP-2) as a nature partner. 
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Sequences of Peptide Binders to Tie-1 

Consensus : GxAWFLDRWGNP 

>RPT13 SLWGCSGRAVLFLDSVGNPTGTFVRC 

>RPT9 RRVDAGGAWYLDRWGNVSV 

>RPT34 WFLDRWGNPQYLGVKASGG 

TI1-G11-R40 GPFSWLFETEWGNPKTVPFGADRWNRHGRWDPGPVSDYGT 

Results of Advanced Blast Search 

Reference : 

Altschul, Stephen P., Thomas L. Madden, Alejandro A. 
Schaffer, 

Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. 
Lipman (1997) , 

"Gapped BLAST and PSI -BLAST: a new generation of protein 
database search 

programs". Nucleic Acids Res. 25:3389-3402. 

RID: 988980952-24595-10839 

Query= RPT9 : RRVDAGGAWYLDRWGNVSV 
(20 letters) 



Non-redundant SwissProt sequences 
96,103 sequences; 35,068,824 total letters 



Score E 

Sequences producing significant alignments: 
(bits) Value 

gi I 7387859 I sp|O00187 |MASP2_HUMAN MANNAN- BINDING LECTIN 
SERIN... 2 9 0.17 



Alignments 

30 

>gi I 7387859 1 sp I 000187 I MAS2_HUMAN MANNAN -BINDING LECTIN 
SERINE PROTEASE 2 PRECURSOR (MANNOSE-BINDING 

PROTEIN ASSOCIATED SERINE PROTEASE 2) (MASP-2) 
Length = 686 

35 
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Score = 29.1 bits (61), Expect = 0.17 

Identities = 13/25 (52%), Positives = 16/25 (64%), Gaps 
7/25 (28%) 

Query: 2 RVDAGGAWYLD RW---GNVS 19 

R D+GGA+V+LD RW G VS 
Sbjct: 630 RGDSGGALVFLDSETERWFVGGIVS 654 



Example 6: Generation of Agonist/Antagonist Assays for Tie-1 Recoptor for 
Determining Phenotypic Effects of Surrogates: 

Surrogates are tested for agonist and/or antagonist activity in cell lines 
expressing both full lengthTie-1 and a chimeric receptor containing the 
extracellular domain of Tie-1 and the cytoplasmic region of the epidermal growth 
factor receptor (EGFR). The EGFR was chosen because: a) both the EGFR and 
15 Tie-1 are receptor tyrosine kinases; b) both appear to signal following 

dimerization and c) there is an extensive body of information regarding EGFR 
signal transduction pathways and the downstream events involved in 
transcription and cell growth. 

20 Several models are used including cell proliferation and gene reporter 

assays. In the proliferation models, full length and chimeric Tie-1 are transfected 
into the IL-3 dependent cell line, FDC. After selection, these cells proliferate in 
the presence of a putative Tie-1 agonist. In gene reporter assays, various gene- 
reporter systems are used, including STAT (signal transducer and activator of 
transcription) -luciferase, STAT-GFP (green fluorescence protein), SRE (serum 
response element)-luciferase and SRE-GFP. Co-transfection experiments 
establish cell lines expressing either full length or chimeric Tie-1 . These STAT 
and SRE lines allow the high throughput screening of phage clones to determine 

30 the putative bioactivity of the peptide surrogates. See Carcamo, et al. Proc. Natl 
Acad Sci USA 95: 1 1 146-1 1 1 51 . 

The complete ORF of the Tie-1 gene is cloned from fetal human brain 
(Clontech Quick-Clone cDNA) or fetal human heart using the following primers: 
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5' Tie-1 forward: GGT CGG CCT CTG GAG TAT GGT CTG 

3' Tie-1 reverse: TCC TTG AGG CAG CTT AAG TCA GAG 

The complete ORF of the EGFR gene is cloned from the above libraries 
or from a placental cDNA library (Clontech Placenta Marathon ready cDNA) 
5 using the following primers: 

5' EGFR foHA/ard: GGA GCA GCG ATG CGA CCC TC 

3' EGFR reverse: GGT CCT GGG TAT CGA AAG AGT CTG G 

In the chimeric receptor, the extracellular and transmembrane regions of 
Tie-1 are joined to the cytoplasmic kinase domain of the EGFR with an NHE I 
site which will add the amino acids alanine and serine at the junction. The 
primers for generating the chimeric receptor are the following primers (with the 
NHE site underlined): 

^5 EGFR forward: GCG CTG CTA GCC GAA GGC GCC ACA TCG TTC 

Tie-1 reverse: GCT GCT GCT AGC GAT GCA CAC CAG GGT TAA 
AAGG 

Both the full length Tie-1 and the chimeric receptor are cloned into 

>0 

pCDNA 3.1 for transfection experiments. 

The various target cell lines are used to screen surrogate peptides with 
agonist and antagonist activity. The surrogates are used as peptidomimetics or 
for the generation of Site Directed Assays and small molecule discovery via high 

>5 

throughput screening. 
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